Methods for assembling dna molecules

ABSTRACT

The invention provides compositions and methods for assembling a DNA molecule having a desired sequence. The methods involve contacting a DNA polymerase, dNTPs, and a plurality of pairs of oligonucleotides. The oligonucleotides of a pair have a portion of the desired sequence, and an internal sequence that overlaps and is complementary to an internal sequence of the other oligonucleotide of the pair, and, when arranged in order, they have at least a portion of the desired sequence. The oligonucleotides also have a 3′ or a 5′ primer binding sequence having a binding site for a primer. The oligonucleotides that correspond to the end oligonucleotides of the desired sequence also have a universal 3′ flanking sequence and a universal 5′ flanking sequence, respectively.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation application of U.S. application Ser.No. 15/839,597 filed Dec. 12, 2017, now issued as U.S. patent Ser. No.11/060,137; which claims the benefit under 35 USC § 119(e) to U.S.Application Ser. No. 62/434,300 filed Dec. 14, 2016, now expired. Thedisclosure of each of the prior applications is considered part of andis incorporated by reference in the disclosure of this application.

BACKGROUND OF THE INVENTION Field of the Invention

The invention pertains to processes and compositions for the assembly ofnucleic acid molecules.

Background Invention

The synthesis and assembly of DNA molecules remain critical technologiesin the field of molecular biology. In particular, the ability to readilyassemble multiple double-stranded DNA molecules in a correct order toyield a functional gene or nucleic acid construct (e.g., a vector) ispresently an important objective in molecular biology. Existingtechniques often involve the parallel synthesis of oligonucleotidefragments with a subsequent assembly of the fragments into a larger DNAmolecule. Many techniques also require the identification and use ofrestriction sites for cleavage by a particular restriction enzyme toserve as sites for restriction enzyme cleavage and integration of asequence of interest. This may be followed by cloning in a suitable hostto yield the final nucleic acid construct. While these techniques areoften useful, the user encounters difficulty with the assembly of aconstruct containing multiple genes of interest. A number of additionaltechniques have been developed to circumvent these difficulties. Thesesolutions often involve some method of disrupting restriction sites.These techniques are also often labor intensive, such as splicing byoverlap extension or other methods of generating single-strandedoverhangs.

Other conventional methods of oligonucleotide assembly rely onpartitioned or array-based oligos in which they are pooled and assembledtogether without regard to how the oligos may interact with each otherin the reaction as a whole. Often, array-based oligos are amplified assingle oligos and then flanks are removed using restriction enzymes,which can leave partially inhibiting “scar” sequences on the DNA. Thesescar sequences can make it difficult to assemble the oligos into higherorder DNA assemblies.

It would therefore be useful to have methods of assembling DNA moleculesthat are easy to use, not labor intensive, and that do not leave “scar”sequences on the DNA.

SUMMARY OF THE INVENTION

The invention provides methods for the assembly of DNA molecules havinga desired sequence. In one embodiment a plurality of pairs ofoverlapping oligonucleotides are amplified (e.g., via PCR) in a methodthat leverages conserved primer binding sequences at the termini of eacholigonucleotide in the pair. The conserved flanking sequences prohibitthe oligonucleotide pairs from assembling with one another to form theDNA molecule of desired sequence at an initial stage, and can also serveas 3′ and/or 5′ primer binding sequences. After forming couplets andamplifying the oligonucleotide pairs, the conserved flanking (primerbinding) sequences that inhibit assembly are removed, for example byusing a process of scarless flank removal (SFR). The oligonucleotidescan then be assembled (e.g., a PCR reaction or a variant thereof) intothe DNA molecule of desired sequence. The pairs of oligonucleotides thatcomprise the end oligonucleotides of the DNA molecule of desiredsequence can also comprise additional, universal 3′ and 5′ flankingsequences, which can be utilized for amplification of the DNA moleculeof desired sequence after assembly. The methods are useful for producingDNA molecules de novo and with very low occurrences of nucleotideerrors. Constructs produced by the methods can also be subsequentlyassembled into larger DNA molecules, by the same method or using othermethods. In various embodiments nucleic acid molecules assembled by themethods can be further assembled into larger nucleic acid molecules byGIBSON ASSEMBLY®, or other DNA assembly techniques.

In a first aspect the invention provides methods for assembling a DNAmolecule having a desired sequence. The methods involve a step of a)contacting a DNA polymerase, dNTPs, and a plurality of pairs ofoligonucleotides, wherein each oligonucleotide of a pair comprises aportion of the desired sequence, and the oligonucleotides of a paircomprise an internal sequence that overlaps and is complementary to aninternal sequence of the other oligonucleotide of the pair. When theplurality of pairs of oligonucleotides is schematically orillustratively arranged in order, adjacent to each other and accordingto their internal sequences they comprise at least a portion of thedesired sequence. The oligonucleotides can also have a 3′ or a 5′ primerbinding sequence. The desired sequence has a 3′ end and a 5′ end, andthe oligonucleotide pairs that make up the 3′ and 5′ ends of the desiredsequence can also have a universal 3′ flanking sequence and a universal5′ flanking sequence, respectively. The methods can also involve a stepof b) performing a first amplification reaction on the plurality ofpairs of oligonucleotides, and a step c) of removing the 3′ and 5′primer binding sequences from the plurality of pairs ofoligonucleotides; and a step d) of subjecting the plurality of pairs ofoligonucleotides to an assembly reaction to assemble the dsDNA moleculehaving the desired sequence. In some embodiments the methods areperformed in the order of steps recited herein, e.g., steps a-d. Themethods can further involve one or more step(s) of forming a pluralityof couplets from the plurality of pairs of oligonucleotides, which canbe done prior to the first amplification step.

In some embodiments the first amplification reaction further containsprimers that bind to the 3′ and 5′ primer binding sequences. When theuniversal 3′ flanking sequence and the universal 5′ flanking sequenceare used, they can be present to the inside of the 3′ and 5′ flankingsequences, respectively. In some embodiments the methods can involve astep of removing at least a portion of the universal 3′ flankingsequence and the universal 5′ flanking sequence. The first amplificationreaction can be PCR, PCA, a variant procedure of PCR or PCA, or any DNAamplification method and can have a denaturation phase, an annealingphase, and an extension phase. The methods can also involve a second,and optionally additional, amplification reactions.

In some embodiments the 5′ and 3′ primer binding sequences on anoligonucleotide pair are not complementary to each other. The method canassemble a plurality of DNA molecules of desired sequences, which insome embodiments are a plurality of distinct genes. The pairs ofoligonucleotides can be comprised on a solid support, for example anucleic acid array. In some embodiments at least 15 pairs ofoligonucleotides or couplets are present on the array. Theoligonucleotides or couplets can comprise from 60 to 100 nucleotides andin some embodiments the primer binding sequences can have from 8 to 30nucleotides. In some embodiments the dsDNA molecule assembled is anucleic acid construct, for example a plasmid, an artificial chromosome,or a functional gene.

In some embodiments a first pair of oligonucleotides or couplets has afirst set of 3′ or 5′ primer binding sequences, and a second pair ofoligonucleotides or couplets has a second set of 3′ or 5′ primer bindingsequences, i.e., the pairs of oligonucleotides or couplets contain atleast two sets of 3′ or 5′ primer binding sequences, but can containmultiple sets. In some embodiments the 3′ and 5′ primer bindingsequences do not have a restriction site for a restriction enzyme. Insome embodiments the 3′ and 5′ primer binding sequences are removed bythe action of one or more enzymes that specifically cleave the primerbinding sequences, and the one or more enzymes can lack a restrictionenzyme. In some embodiments the one or more enzymes contain uracil DNAglycosylase, endonuclease VIII, or exonuclease T, or a combination ofenzymes containing one, or two, or all three of them.

In some embodiments the desired DNA molecule is a pre-determinedsequence. In some embodiments the plurality of oligonucleotide pairs canbe contained in a single container and form at least two couplets, andthe at least two couplets can have distinct sets of 3′ or 5′ primerbinding sequences and the primers can bind specifically to the at leasttwo couplets. Any of the methods disclosed herein can involve a step offorming the plurality of pairs of oligonucleotides into a plurality ofcouplets. Any of the methods can be performed in a single container, andany of the methods can be an automated method. In various embodimentsthe oligonucleotides that make up the pairs of oligonucleotides are fromabout 50 to about 200 nucleotides in length. The methods can assemble adsDNA molecule, which can be of a size greater than 5 Mbp. In any of themethods the nucleic acid molecule of desired sequence can be a scarlessDNA molecule.

In another aspect the invention provides compositions containing a DNApolymerase, dNTPs, and a plurality of oligonucleotides formed intocouplets, which can be any plurality of oligonucleotide pairs orcouplets described herein. Each couplet can contain an internal sequencethat comprises a portion of the desired nucleic acid sequence, and whenthe plurality of couplets is arranged in schematic order, adjacent toeach other and according to their internal sequences they comprise atleast a portion of a desired nucleic acid sequence. Each couplet canalso have a 3′ or a 5′ primer binding sequence, and each coupletcontains a sequence that overlaps and is complementary to a portion of asequence from an adjacent couplet. The desired nucleic acid sequence hasa 3′ end and a 5′ end, and in some embodiments the couplets that make upthe 3′ and 5′ ends of the desired sequence can also have a universal 3′flanking sequence and a universal 5′ flanking sequence, respectively.The composition can be contained in a single container and can,optionally, also contain an effective amount of a preservative. Invarious embodiments at least 50% of the oligonucleotides in the mixturecan be present in a couplet. In some embodiments the couplets overlap atleast 33% of their sequences.

DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic illustration of an amplification and/or assemblyreaction of the invention.

FIGS. 2A and 2B present an illustration of the gel produced from theassembly method of the invention in which 500 bp constructs wereassembled, as described in Example 1.

FIG. 3 is a schematic illustration showing the use of alternating setsof 3′ and 5′ primer binding sequences, which have primer binding sites.

FIG. 4 is a schematic illustration of scarless flank removal.

FIG. 5 presents an illustration of the gel produced from variousembodiments of the assembly method of the invention for the assembly ofa 1.2 kb product, as described in Example 2. The gel presents theresults of PCA assembly after scarless flank removal. Lane 1: Pool 3;Lane 2: Pool 1. Lane 3: Pool 2.

FIG. 6 provides a graphical illustration demonstrating error suppressionachieved from the use of the present methods. It demonstrates errorrates 40%-72% lower in the overlapping regions of oligonucleotide pairsversus non-overlapping regions. An overall error frequency of 0.51% wasachieved, illustrating that the methods result in nucleotide errorcorrection.

DETAILED DESCRIPTION OF THE INVENTION

The invention provides methods for amplifying and/or assembling a DNAmolecule having a desired sequence. The methods involve contacting a DNApolymerase, dNTPs, and a plurality of pairs of oligonucleotides to forma mixture. Each oligonucleotide of a pair has a portion of the desiredsequence, and the oligonucleotides of a pair also have an internalsequence that overlaps and is complementary to an internal sequence ofthe other oligonucleotide of the pair. The oligonucleotide pairs can atleast partially bind to form couplets. When the plurality of pairs ofoligonucleotides are arranged in order, adjacent to each other andaccording to their internal sequences they comprise at least a portionof the desired sequence. The oligonucleotides of a pair or couplet alsohave a 3′ or a 5′ primer binding sequence for amplifying the pair anddiscouraging inappropriate annealing. The nucleic acid of desiredsequence has a 3′ end and a 5′ end, and the oligonucleotide pairs orcouplets that correspond to the 3′ and 5′ ends of the nucleic acid ofdesired sequence can additionally have a universal 3′ flanking sequenceand a universal 5′ flanking sequence, respectively, which is useful indownstream assembly and/or amplification. Any of the methods disclosedherein can involve one or more steps of forming a plurality of coupletsfrom the plurality of pairs of oligonucleotides. In some embodiments themethods involve performing a first amplification reaction on theplurality of pairs of oligonucleotides or couplets in the mixture;removing the 3′ and 5′ primer binding sequences from the plurality ofpairs of oligonucleotides or couplets; optionally subjecting theplurality of pairs of oligonucleotides or couplets to a secondamplification reaction using the universal 3′ and 5′ flanking sequences;and subjecting the plurality of pairs of oligonucleotides or couplets toan assembly reaction to thereby assemble the dsDNA molecule having thedesired sequence.

The invention provides a number of advantages over other methods ofassembling DNA molecules. By pairing oligonucleotides into “couplets”the overall complexity of the reactions is effectively reduced, allowingfor a more robust assembly reaction because the oligo pairs can findtheir complementary partners with a higher likelihood and with greaterspecificity than if not coupled. DNA assembly procedures are moreeffective when longer oligos are used as building blocks, butarray-based oligos can accumulate an unacceptable number of errorsduring the synthesis process. It has therefore been necessary to balancesynthesis with oligos that are short enough to keep errors low yet longenough to enable efficient DNA assembly. By pairing the oligos togetheraccording to the invention to form couplets their complexity is reducedand they are effectively “lengthened” and the need to produceexcessively long oligos at the onset an assembly procedure iseliminated.

During the oligo design and synthesis process (e.g., on an array)conserved flanking sequences, which can be 3′ and 5′ primer bindingsites, are introduced into the oligos on the 3′ and/or 5′ ends of eacholigo. The conserved flanking sequences present at the end of each oligoin a pair can be different and non-complementary. The flanking sequencesprevent the oligo pairs from interacting with each other and annealingto each other, due to unfavorable thermodynamic factors. Thus, theoverall complexity of the DNA assembly reaction is reduced. The use ofdistinct 3′ and/or 5′ primer binding sequences on specific oligo pairsor couplets allows for the selective amplification of specific oligopairs or couplets in a complex mixture of starting oligonucleotides.When starting oligonucleotide pools are aliquoted to separate containerscorresponding primer (or primer pairs) can be placed into each containerto selectively amplify the oligonucleotide pair that correspond to theprimer or primer pair used. Thus, one can selectively amplify thedesired portion of the nucleic acid molecule to be assembled in eachseparate container. At the downstream assembly step of the method theflanking sequences can be removed, for example by scarless flank removal(SFR) as described herein prior to assembly of the DNA molecule ofdesired sequence.

Another advantage is that the present invention allows for theamplification of oligonucleotides even when they are present in thereaction in limited supply, for example when present on a solid support(e.g., a microarray). Oligos synthesized on solid phase platforms aretypically present at concentrations that render them unsuitable forassembly, but the present invention avoids this disadvantage and allowsthem to be amplified and easily assembled into larger nucleic acidmolecules.

Additional advantages in the methods include the suppression of errors.The pairing of the oligonucleotides into couplets forces the overlappingregions to anneal in a way that selects for oligos with fewer errors,thus giving a statistical preference for amplifying the oligos withfewer or no errors. The present methods are therefore also methods forproducing nucleic acids having a reduced error rate compared toconventional methods of amplification and assembly. This is anadditional advantage not available with conventional array-based oligosor conventional methods.

Still another advantage of the invention is the ability to achieve“scarless” sequence removal, such as scarless flank removal (SFR).Previously there has been no appropriate method for removing conservedprimer binding sequences from PCR products or otherwise undesiredsequences in the dNA. Any of the methods disclosed herein can usescarless sequence removal, meaning that all nucleotides related to theflanking primer binding sequences, or to another undesired sequence, areremoved and a pre-determined sequence of DNA can be synthesized andassembled in the methods. The invention provides a method for removingDNA from the terminal ends of PCR or other amplification or assemblymethod products without the use of restriction enzymes. The methodtherefore permits the use of a larger number of primer binding sequencesbecause it does not rely on the sequence constraints imposed byrestriction enzymes. Furthermore, the “scarless” removal of the primerbinding sequences on the oligonucleotide pairs or couplets allows for amore robust assembly process to proceed because the DNA overlaps areexposed after this removal is achieved. In the invention methods of DNAassembly can therefore be automated or performed in programmable steps,and can be performed in a stepwise fashion. Nevertheless, in someembodiments restriction sites and restriction enzymes can be located onthe 3′ and/or 5′ primer binding sequences and used to conduct themethods.

Oligonucleotide Pairs

The oligonucleotide pairs or couplets used in the methods have certaincharacteristics. Each oligonucleotide of a pair contains a portion ofthe desired sequence of the DNA molecule. Each oligonucleotide of a pairhas an internal sequence 112, which contains an overlapping andcomplementary sequence to an internal sequence of the otheroligonucleotide of the pair. When the plurality of pairs ofoligonucleotides are schematically or illustratively arranged in order(e.g., as depicted in FIG. 1 and FIG. 3), adjacent to each other andoverlapping according to their internal sequences they comprise at leasta portion of the desired sequence, and can comprise a set of couplets oroligo pairs that comprise the entire desired sequence when each oligo isextended. One or both oligonucleotides in a pair or couplet can overlapwith oligonucleotides in one or more adjacent pairs when arrangedaccording to their internal sequences. One or both oligonucleotides in apair or couplet can overlap with a third oligonucleotide and the threecan form a triplet. The oligonucleotides can also have a 3′ primerbinding sequence 110 and/or a 5′ primer binding sequence 115 foramplifying the oligo pairs and discouraging undesirable annealing, andin some embodiments each oligonucleotide of a pair has the 3′ and/or 5′primer binding sequence. Furthermore, oligo pairs representing the endoligonucleotides of the DNA molecule of desired sequence canadditionally have a universal 3′ flanking sequence 120 or a universal 5′flanking sequence 125, which can be present inside of the 3′ and 5′primer binding sequence 110, 115, respectively. The end oligonucleotidesare those oligos that comprise the ends of the desired sequence that isassembled, and are depicted in FIG. 1 having the flanking sequences 120and 125. The end oligonucleotides can be contained in an oligonucleotidepair or couplet that overlaps only one other pair or couplet, theadjacent one, in the set of oligonucleotides comprising the desirednucleic acid sequence. In some embodiments the end oligonucleotides arethe only oligonucleotides that comprise the universal 3′ and/or 5′flanking sequences 120, 125. In one embodiment when the plurality ofpairs of oligonucleotides are arranged in order, adjacent to each otherand overlapping according to their internal sequences, and consideringonly their internal sequences and, optionally, the universal flankingsequences, they together comprise the sequence of the DNA molecule ofdesired sequence on one or the other strand (not counting any 3′ and/or5′primer binding sequences on said pairs of oligonucleotides), asgenerally depicted in FIG. 1. When the desired nucleic acid molecule isa double-stranded DNA, the plurality of pairs of oligonucleotides canalso comprise, at each nucleotide in the DNA molecule, a nucleotide onat least one of the two strands of the double-stranded DNA molecule. Thenucleotides of the opposite strand are, of course, filled in by DNApolymerase extension during amplification and/or assembly to form thedesired double-stranded DNA molecule. This is also illustrated in FIG. 1where after a first amplification the plurality of pairs comprise theDNA molecule of desired sequence, which can be assembled to form the DNAmolecule of desired sequence as depicted at the bottom of FIG. 1.

In the methods a first (and optionally second and subsequence)amplification reaction(s) can be performed on the plurality of pairs ofoligonucleotides until a suitable quantity of pairs of oligos arepresent. The 3′ and 5′ primer binding sequences 110, 115 can then beremoved from the plurality of pairs of oligonucleotides, and theplurality of pairs can be subjected to an assembly reaction to therebyassemble the dsDNA molecule having the desired sequence. The pairs canbe readily assembled and the assembled nucleic acid further amplifiedbecause the end oligonucleotides contained the universal 3′ and 5′primer binding sequences 120, 125, which are still present at the endsof the assembled molecule.

The desired sequence can be a pre-determined sequence, meaning that thesequence assembled in the method is a specific sequence known ordetermined by the user prior to conducting the method. The desiredsequence to be assembled in the method is therefore predictable. Randomhomologous recombination does not normally result in a “desired”sequence because the user cannot predict the precise location therecombination will occur, and therefore cannot predict the specificsequence that will result from the method. The double-stranded nucleicacid molecule can be a DNA molecule, an RNA molecule, an rRNA molecule,a cDNA molecule, or any nucleic acid molecule. In various embodimentsthe nucleic acid molecule can be a gene or a functional gene and canencode a protein or polypeptide, or can be a regulatory sequence, or a“housekeeping” gene sequence, or any nucleic acid sequence. The nucleicacid molecule can also be a portion of a gene, a whole gene, a genepathway, a whole genome (e.g., of a bacteria, algae, cyanobacteria,virus, etc.), a promoter, a terminator, a regulatory sequence, aCRISPR/Cas9 gRNA template, a DNA-based computer memory, or any othernucleic acid sequence. In various embodiments the nucleic acid sequencecan be a portion of any of the types of nucleic acid sequences above,e.g., at least 25% or at least 50% or at least 75% or 25-99% or 50-99%or 75-99% of a gene, portion thereof, or of any of the nucleic acidsequences recited above.

Referring to FIG. 1 the pairs of oligonucleotides at least partiallybound to each other (“couplets”) have particular parts, including aninternal sequence 112, which contains a portion of the desired nucleicacid sequence of the DNA molecule to be assembled. The oligonucleotidepairs have at least one portion of the internal sequence 112 thatoverlaps and is complementary to 117 a portion of the internal sequenceof the other member of the pair, and the two oligonucleotides canthereby at least partially bind to each other and form a “couplet.”Thus, a pair of oligonucleotides can bind to form a couplet. In variousembodiments the couplets are bound to each other by at least one or atleast three or at least five or at least eight or at least 10nucleotides by hydrogen bond pairing. The overlapping portions can atleast partially bind to each other by forming hydrogen bonds and thusanneal to each other at the complementary sequence. Examples ofoligonucleotide couplets are depicted in FIG. 1. The internal sequencesform at least a portion of the desired nucleic acid molecule. In FIG. 1the desired nucleic acid molecule is depicted as 150, which in thisembodiment contains the universal 3′ and 5′ flanking sequences 120, 125.When the oligonucleotides are assembled in order, adjacent to each otherand according to their internal sequences, the oligonucleotides compriseat least a portion of the sequence of the desired dsDNA molecule 150. Insome embodiments a portion of the sequence of the desired DNA moleculeis contained on each oligonucleotide in the set. In some embodiments allof the oligonucleotides together comprise the nucleic acid sequence ofthe nucleic acid molecule being assembled, or a complement thereof, asdepicted in FIG. 1. The pairs of oligonucleotides can also have a 5′and/or 3′ primer binding sequences 115, 110 (or complementary sequencethereof). Complementary or complement sequences refers to standardWatson-Crick base pairing. In some embodiments the 3′ primer bindingsequence and 5′ primer binding sequence on an oligo pair can bedifferent sequences to avoid circularization and self-annealing of thetwo ends of the couplet or pair. Each oligonucleotide in the pair canthus comprise a primer binding sequence at its 3′ and/or 5′ end. In someembodiments a plurality of oligonucleotide pairs can be present in asingle container and form a plurality of couplets, and the plurality ofcouplets can each have a distinct set of 3′ or 5′ primer bindingsequences, as explained herein. In some embodiments the couplets can beamplified and/or assembled in the single container. The method cantherefor also involve the use of at least two sets of primers that bindspecifically to the at least two couplets, respectively.

Some pairs of oligonucleotides comprise the ends of the desired dsDNAmolecule and are “end oligonucleotides.” In addition to the othercharacteristics of an oligonucleotide or couplet, such as a 3′ and/or 5′primer binding sequence, the end oligonucleotides can further comprise a5′ and/or 3′ universal flanking sequence 125, 120. The universalflanking sequences 125, 120 can be a primer binding site, and can be aprimer binding site of a different sequence or that binds a differentprimer than the 3′ and/or 5′ primer binding sequences 110, 115. Theuniversal flanking sequences on the 5′ and 3′ end oligonucleotides canbe the same sequence or a different sequence from each other. Theuniversal flanking sequences allow for downstream amplification and/orassembly of the dsDNA molecule of desired sequence after it has beenassembled.

The oligonucleotide pairs utilized in the invention can be synthesizedthrough any convenient method. While various methods of synthesizingoligonucleotides by design are known in the art, one example is in situsynthesis on solid phase microarrays where the solid phase is loadedwith a multiplicity of different sequences during the synthesis. But anymethod of synthesizing oligonucleotides can be used. Oligonucleotidescan also be traditional partitioned oligos derived from DNA from anatural source. The oligonucleotides can be formed into couplets aftersynthesis or other method of obtaining the oligonucleotides. Any of theflanking sequences can be synthesized with the synthesis of theoligonucleotide.

The starting oligonucleotides or couplets can be those present prior tothe first amplification step and those on which the first amplificationstep is performed. The starting oligonucleotides can be annealedtogether to form couplets. In various embodiments the startingoligonucleotides can be from 10-20 nucleotides or 15-20 or 20-60nucleotides or 20-80 or 20-100 or 20-200 or 20-225 or 20-250 or 40-60nucleotides or 40-85 or 40-100 or 40-225 or 50-150 or 50-200 or 50-250or 60-200 or 60-150 or 60-120 or 60-100 or 60-85 or 80-120 or 40-150 or40-200 or 40-225 or 40-250 nucleotides, or greater than 40 nucleotidesor greater than 50 or greater than 60 or greater than 75 or greater than100 or greater than 250 nucleotides. The overlapping region in anoligonucleotide pair can be at least 5 bp or at least 8 or at least 10or at least 12 or at least 15 or at least 17 or at least 20 bp or atleast 40 or at least 60 or at least 80 or at least 100 bp or from 5-10or 5-12 or 5-15 or 5-17 or 5-20 or from about 15 to about 30, or fromabout 12 to about 35, or from about 15 to about 60 bp, or from about 15bp to about 120 bp, or from about 20 to about 200 bp, or from about 20to about 120 bp, or from about 20 to about 100 bp, or from about 20 toabout 80 bp, or from about 20 to about 50 bp, or from about 25 to about40 bp, or from about 20 to about 40 bp in length, or from about 100 bpto about 500 bp, or from about 200 bp to about 700 bp, or from about 200bp to about 1000 bp, or from about 200 bp to about 1500 bp, or fromabout 30 to about 200 bases, or from about 20 to about 150 bases, orfrom about 20 to about 120 bases, or from about 20 to about 100 bases,or from about 20 to about 80 bases, but any suitable length of overlapcan be used. Oligonucleotides or oligo pairs can be applied in themethod at any suitable concentration, and non-limiting examples includeless than 2 nM or less than 3 nM or less than 5 nM or less than 2.5 nMor less than 1.25 nM or less than 1.0 nM or less than 700 fmol or lessthan 500 fmol or less than 250 fmol or less than 100 fmol or less than 1fmol, or 0.5-2 nM or 0.5-5 nM or 2-5 nM or 2-10 nM or 2-20 nM or 1-100fmol or 1-1000 fmol or 500-1000 attamol or 700-1000 attamol.

Oligonucleotides in an oligo pair can be designed to have an internalsequence that will be part of either strand (e.g., the sense oranti-sense strand) of the nucleic acid molecule of the desired sequenceto be assembled. In various embodiments each oligonucleotide in a set ofoligonucleotides will have an internal sequence that is part of one ofthe strands (e.g., the sense or anti-sense strand) of the desiredsequence.

In one embodiment the oligonucleotide pairs are part of a set ofoligonucleotides, and the set of oligonucleotides, when assembledaccording to a method of the invention, comprises the nucleic acidmolecule having a desired sequence. The set can optionally have primerbinding sequences at their 3′ and/or 5′ ends, and universal flankingsequences on the end oligonucleotides. The set can also includesequences that are subsequently deleted to form the nucleic acid ofdesired sequence.

Internal Sequence

The oligonucleotides of the invention contain an internal sequence 112.In one embodiment when the oligonucleotide pairs are arranged adjacentto each other and in proper order according to their internal sequences,the internal sequences comprise all or a portion of the sequence of thenucleic acid molecule of desired sequence that is assembled in themethods. The internal sequences of two oligos of a pair at leastpartially overlap and the oligos of the pair can anneal to each other. A“couplet” is comprised of two oligos that at least partially overlap intheir internal sequences and examples are depicted in FIG. 1. Theinternal sequences can also overlap at least partially with one or moreoligonucleotides of an adjacent couplet to form the all or a portion ofthe sequence of the nucleic acid molecule of desired sequence. Thus, aparticular oligo can overlap in some instances with one adjacent oligoto form a pair and in other instances with a second adjacent oligo toform a second, distinct pair. Nevertheless, the oligo pairs or coupletsare a set that can be assembled into the nucleic acid of desiredsequence. The end oligonucleotides can overlap with an oligonucleotideof only one adjacent oligonucleotide. To be arranged according to theirinternal sequences means the oligos are arranged adjacent to each otherso that the internal sequences together form all or a portion of thesequence of the desired nucleic acid molecule. The internal sequencescan also comprise all or a portion of the sequence of the nucleic acidmolecule of desired sequence not counting universal flanking sequenceson the nucleic acid molecule of desired sequence. In some embodimentsthe internal sequences can also comprise the overlapping portions of theoligonucleotide pairs. Each such portion of an oligonucleotide can becomplementary to a portion of the other oligonucleotide of the pair.

3′ and/or 5′ Primer Binding Sequences

The oligonucleotides of the invention, and therefore an oligo pair orcouplet, can also comprise a 3′ and/or a 5′ primer binding sequence. Theprimer binding sequence can be a primer binding site for one or moreprimers, and is useful during the amplification reactions. In someembodiments each oligo of a pair or couplet comprises a 3′ primerbinding sequence, or a 5′ primer binding sequence. But of course whenthe oligo pairs are extended to form a full double-stranded nucleic acidmolecule a primer binding sequence will be present on both the 3′ and 5′ends. FIG. 1 illustrates an example where the oligos in a pair eachcomprise a 3′ primer binding sequence 110 and are extended to adouble-stranded nucleic acid with a primer binding sequence on both the3′ and 5′ ends 110, 115. In one embodiment the 3′ and/or 5′ flankingprimer binding sequences can be designed and synthesized as part of theoligonucleotides at the stage of oligonucleotide synthesis. Theoligonucleotides can be synthesized so that each oligo has either a 3′or a 5′ flanking primer binding sequence. When the oligonucleotidesanneal to form couplets the couplet can have a primer binding sequenceat the 3′ and/or 5′ end. In other embodiments the 3′ and/or 5′ primerbinding sequences can be added later to the oligonucleotides. The 3′and/or 5′ primer binding sequences can also be introduced to theoligonucleotides or oligo pairs subsequent to synthesis. The 3′ and 5′primer binding sequences can be introduced by any suitable method, withligation and amplification being two examples. The primer bindingsequences can be introduced as single or double-stranded nucleic acidsequences and can be a known or unknown sequence, as long as it canperform the function of the primer binding sequence. But someoligonucleotides can lack a 3′ and/or 5′ primer binding sequence. Insome embodiments these oligonucleotides that lack a primer bindingsequence can be members of a triplet and overlap with two otheroligonucleotides, which can each have a 3′ and/or 5′ primer bindingsequence.

Considering an embodiment where oligonucleotide pairs are arrangedadjacent to each other and according to their internal sequences tocomprise at least a portion of the nucleic acid of desired sequence(e.g., FIG. 1), each oligonucleotide pair has a 3′ and/or 5′ primerbinding sequence on its corresponding distal end. These sequencescomprised on a particular pair or couplet can be referred to as a “set”of 3′ and/or 5′ primer binding sequences. Nevertheless, when the coupletis amplified and extended the double-stranded DNA fragment formed willhave a primer binding sequence on both the 3′ and the 5′ distal ends.Sets of primer binding sequences can be conveniently depicted asletters, numbers, or any symbol that distinguishes one set from a set ofdifferent sequence. In some embodiments the oligonucleotides of a pairor couplet contain a set of primer binding sequences that is differentfrom and non-complementary to the set on the adjacent oligonucleotidepair or couplet and thus, they do not anneal to an adjacent oligo pairor couplet. An oligo pair or couplet can also contain different andnon-complementary primer binding sequences on their 5′ and/or 3′ ends,and therefore is prevented from forming a circular DNA molecule byself-annealing. Thus, the primer binding sequences in a set can bedifferent from each other but, in other embodiments, can also be thesame. In a particular embodiment two sets of 3′ and 5′ primer bindingsequences are used in a method where the oligonucleotide pairs orcouplets are arranged adjacent to each other and according to theirinternal sequences to form the desired nucleic acid. In this embodimentone set is used on the odd numbered oligonucleotide pairs or couplets(e.g., couplet 1, 3, 5, etc.) and a second, different set is used on theeven numbered oligonucleotide pairs or couplets (e.g., couplet 2, 4, 6,etc.). The “even” and “odd” numbered pairs are determined by consideringthe first oligonucleotide pair or couplet in the above arrangement asthe first “odd” number and the second oligonucleotide pair or couplet inthe arrangement as the first “even” number, and so on, for example asnumbered in FIG. 3. Utilizing such an arrangement can allow multiplenucleic acid molecules to be assembled from a single pool of coupletsand, optionally, in a single reaction in the method because, forexample, couplets 1 and 3, or couplets 2 and 4 will not anneal to eachother due to the differences in their sequences, even if they have thesame set of primer binding sequences. And couplets 1 and 2, or 3 and 4,will not anneal to each other because the primer binding sequences aredistinct and non-complementary. Thus, the odd numbered or the evennumbered oligo pairs or couplets (i.e., alternate oligo pairs orcouplets) can be amplified in a single pool.

Thus, primer binding sequences A and B can represent two primer bindingsequences that are different sequences and not complementary and,therefore, do not anneal to each other. A and B together can form a setA-B. Referring to FIG. 3, in one embodiment constructs 1-4 comprise atleast one strand of a double-stranded DNA molecule of a desiredsequence. An A-B set of flanking primer binding sequences can be placedon the odd numbered oligonucleotide pairs or couplets and a C-D set offlanking primer binding sequences can be placed on the even numberedoligonucleotide pairs or couplets. This arrangement preventscircularization of a couplet by self-annealing and also preventsadjacent couplets from annealing before the intended step in the method.Instead the couplet is preserved and available for amplification untilassembly of the DNA molecule is desired. Nevertheless, specific primerbinding sequences can be varied and selected based on the needs of theparticular application. Thus, in some embodiments each oligo pair orcouplet can have the same 3′ and/or 5′ primer binding sequence ifdesired (e.g., set A-A). This embodiment might be selected in caseswhere, for example, the members of the oligonucleotide pair or coupletare short enough that formation of a circular molecule will not occur.In one embodiment the 3′ and 5′ primer binding sequences do not comprisea restriction site that is cleavable by a restriction enzyme (e.g., arestriction endonuclease).

The actual primer binding sequence can be any appropriate sequence towhich a primer can bind. In some embodiments the 3′ and/or 5′ primerbinding sequence is a poly-A sequence. In some embodiments the sequencecan be a universal primer sequence. In various other embodiments thesequence can contain one or more uridine nucleotide residues or othernucleotides, that provide a cleavage site for an enzyme to remove theprimer binding sequence. Examples include, but are not limited to,having a deoxyuridine every fourth base or every third base or fifthbase or every sixth base to make the site a substrate for the enzyme UDGand prepare the sequence for removal by other enzymes. In otherembodiments the sequence can have at least one uridine per 5 bases or atleast two or at least three uridines per 5 bases. In other embodimentsthe primer binding sequence can contain a poly-U having four or five orsix or seven or eight or nine or ten or more than ten dU nucleotides, orany combination of the above. But in other embodiments the flankingsequence can be a binding site for a restriction enzyme. In otherembodiments the site can be designed so it is not a restriction sitethat will be cleaved by any restriction enzyme. The 3′ and/or 5′ primerbinding sequences can include non-standard bases to which an enzyme isavailable that cleaves or specifically marks a nucleotide or nucleotidesequence for cleavage and removal by another enzyme. In some embodimentsthe primer binding sequence(s) can be a site for cleavage by aparticular restriction enzyme or group of enzymes. Furthermore, theprimer binding sequences can be sites for binding for Cas9 enzyme, andthus CRISPR/Cas9 can be used to cleave off the flanking sequence.

The primer binding sequences of the invention can be any length suitablefor a primer binding sequence under the reaction conditions. In variousembodiments the primer binding sequence can be from about 6-30 or 8-30nucleotides in length, or from about 6-40, or 6-25 or 6-20 or 6-15nucleotides, or 8-25 or 8-20 nucleotides, or from about 10 to about 30nucleotides or 10-25 or 10-20 nucleotides, or from about 12 to about 25nucleotides or 12-30 or 12-20 nucleotides, or from about 15 to about 25nucleotides or from about 15 to about 35 nucleotides or from 15 to about50 nucleotides or from 10-100 or from 20-250 or from 25 to 350 or from25 to 500 or from 10 to 1000 nucleotides.

In another embodiment a distinct 3′ and 5′ primer binding sequence setcan be used for each oligonucleotide pair or couplet to be assembledinto the DNA molecule of desired nucleotide sequence. This embodimentallows a particular oligonucleotide pair or couplet (i.e., that pair orcouplet having the corresponding primer binding sequence) to beassembled in a mixture containing several or all of the oligonucleotidepairs or couplets. In one embodiment various aliquots of synthesizedoligonucleotides can be set out separately and a primer or primer setcorresponding to a particular primer binding sequence can be used toamplify only certain oligo pairs or couplets (or DNA fragments) in aspecific sample mixture.

Universal Flanking Sequences

One or more of the oligonucleotide pairs or couplets that comprise a DNAmolecule of desired sequence can have a 3′ and/or 5′ universal flankingsequence 120, 125. The universal flanking sequence is a sequence thatcan serve as a primer binding site for the amplification of theassembled dsDNA molecule of desired sequence 150. Primers that bind tothese sequences can be provided in an amplification reaction afterassembly of the nucleic acid molecule of desired sequence. In someembodiments the end oligonucleotides have a 3′ or 5′ universal flankingsequence, but not both with respect to the nucleic acid molecule ofdesired sequence. In some embodiments all of the oligonucleotides beingassembled have either a 3′ or 5′ primer binding sequence, but not both.

In one embodiment the universal flanking sequence is present on only theend oligonucleotides or couplets, i.e., the oligonucleotidescorresponding to (and that will form) the sequence of the 3′ and/or the5′ ends of the DNA molecule of desired sequence when the pairs orcouplets are arranged adjacent to each other and in proper orderaccording to their internal sequences so that the internal sequencescomprise the sequence of the nucleic acid molecule of desired sequenceassembled in the methods. In some embodiments the universal 3′ and/or 5′flanking sequence is present on the end oligonucleotides “inside” the 3′and/or 5′ primer binding sequence, i.e., proximal to the 3′ and/or 5′primer binding sequence and distal to the internal sequence (e.g., asdepicted in FIG. 1). Proximal indicates away from the outer ends of thenucleic acid molecule of desired sequence to be assembled and towardsthe center or overlapping region in the couplets; distal indicatestowards the outer ends. The 3′ and/or 5′ universal flanking sequencesare therefore useful for amplifying the DNA molecule of desired sequencethat is assembled in the methods, and this step can occur afterassembly. The 3′ and/or 5′ universal flanking sequences can be the samesequence or different sequences at the 3′ and 5′ ends of the nucleicacid molecule of desired sequence, and can utilize sequences asdescribed for the 3′ and/or 5′ primer binding sequences describedherein.

While various methods of assembling DNA are available in the art, thepresent methods offer the ability to assemble a DNA molecule of desirednucleotide sequence where the nucleic acid molecule assembled does notcomprise an expressed sequence tag; in other embodiments theoligonucleotide pairs or couplets being assembled do not comprise anexpressed sequence tag; in other embodiments the method of assembly doesnot involve circularizing DNA or utilizing circularized DNA.

In some embodiments the universal 3′ and 5′ flanking sequences areremoved after assembly of the nucleic acid of desired sequence, whichcan be done as described herein. But the universal flanking sequencescan remain on the nucleic acid molecule and be used for subsequentamplification or other techniques, for example GIBSON ASSEMBLY®, orother subsequent DNA manipulation techniques.

First Amplification Reaction

The method can involve performing one or more steps of amplification ofthe plurality of couplets or pairs of oligonucleotides. The one or moreamplification steps 101 can be performed according to any appropriatePCR method or other amplification procedure using methods and reactionparameters known to persons of ordinary skill in the art. “PCR,” thepolymerase chain reaction, as used herein can include variants of PCRand non-limiting examples include multiplex PCR, “hot start” PCR,polymerase cycling assembly, assembly PCR, and quantitative PCR. TheExamples provide exemplary PCR amplifications, but the person ofordinary skill understands the specific reaction parameters can bevaried depending on the particular oligonucleotides being assembled.Additional examples of PCR amplification methods are described in US2014/0308710, which is hereby incorporated by reference in its entirety.Subsequent amplification step(s) can be performed until a sufficientquantity of couplets or oligo pairs has been generated.

In the first and subsequent amplification step(s) of the methods thecouplets or oligonucleotide pairs can be amplified and extended. At thisstage the couplets can have the 3′ and 5′ flanking primer bindingsequences because they have been formed into a double-stranded DNAfragment 116. The primer binding sequences prevent the couplets or oligopairs from assembling with adjacent couplets or oligo pairs in themixture prematurely and before intended, and before sufficientamplification of the couplets or oligo pairs has occurred.

Removal of 3′ and/or 5′ Primer Binding Sequences/Scarless Flank Removal

The methods can also involve a step of removing the 3′ and/or 5′ primerbinding sequences from the couplets or pairs of oligos to produce aplurality of couplets or oligonucleotides having an internal sequence,and end oligonucleotides that have the universal flanking sequences.This step can occur after one or more steps of amplification of thecouplets or oligo pairs.

Previously available methods of removing primer binding sites from DNAinvolved the use of restriction enzymes, which leave nucleotideartifacts and unwanted nucleotide sequences. In some embodimentsscarless flank removal can be performed, which allows for the removal ofconserved DNA flanking sequences from PCR products and can be donewithout the use of any restriction enzymes. This therefore allows anunlimited number of primer binding or flanking sequences to be usedbecause it does not rely on sequence constraints imposed by the need forrestriction enzymes. The method also allows for a more robust assemblybecause the DNA overlaps are exposed after this removal is achievedallowing for the couplets or oligo pairs to be assembled into thedesired nucleic acid sequence. An assembled product can be produced thathas no extraneous, nonspecific, nucleotide remnants from restrictionenzyme cleavage, or otherwise unwanted base pairs are left in thenucleic acid molecule being assembled.

Scarless flank removal exposes the overlaps between the couplet pairsand permits their assembly, and can be achieved using a number ofenzymes. In some embodiments the 3′ and/or 5′ primer binding sequences110, 115 present on the oligonucleotide pairs can comprise non-standard(for DNA) bases, for example deoxyuridine bases (dU) or poly dU bases,so as to be a substrate for an enzyme that selectively cleaves at thenon-standard base (or base that is otherwise site specific with respectto enzyme cleavage). In some embodiments the non-standard base is dU andthe enzyme is uracil-DNA glycosylase (UDG). Non-standard bases can allowthe sequences to be recognized by enzymes that can remove them at anappropriate step in the methods. Thus, in some embodiments the 3′ and/or5′ primer binding sequences can comprise a substrate or a sequencerecognized by an enzyme (or enzymes) that specifically cleave(s) theprimer binding sequences, which are thus removed. In some embodimentsthe specific cleavage occurs at a non-standard base (e.g., dU), but canalso be sequence specific, or another means of selective cleavage, e.g.,restriction cleavage. In some embodiments the 3′ and/or 5′ primerbinding sequences comprise deoxy-uracil residues, which are substratesfor UDG. In some embodiments a mixture of enzymes can be used to removethe 5′ and/or 3′ primer binding sequences. In some embodiments theenzyme mixture can comprise Uracil-DNA glycosylase (UDG), endonucleaseVIII (Endo VIII, a DNA glycosylase-lyase), and exonuclease T (Exo T).Without wanting to be bound by any particular theory it is believed thatin these embodiments the UDG catalyzes the release of uracil fromuracil-containing nucleotides, leaving an apyrimidinic site and asubstrate for endonuclease VIII (endo VIII). The endo VIII then acts asan AP-lyase at the resulting site. Endo VIII cleaves 3′ and 5′ to the APsite leaving a 5′ phosphate and a 3′ phosphate. Exonuclease T (a.k.a.ExoT or RNase T) is a single-stranded RNA or DNA specific nuclease thatrequires a free 3′ terminus and removes nucleotides in the 3′ to 5′direction to generate blunt ends. ExoT thus removes any single-strandedhangs remaining to yield a blunt cut DNA molecule. While thiscombination of enzymes represents one embodiment of a mixture of enzymesthat can be conveniently used, the person of ordinary skill withreference to this disclosure will realize other combinations of enzymesthat will yield a suitable result by substituting any or all of theseenzymes since other enzymes can have the same or very similaractivities. While these embodiments illustrate the use of dU and UDG,any non-standard nucleotide can be used that has a corresponding enzymethat will cleave the site or mark it for cleavage by another enzyme.

Assembly

For assembly of the desired DNA molecule, primers can be added to thereaction that are complementary to and bind the 3′ and/or 5′ primerbinding sequences. In some embodiments primers used in the inventioncomprise a forward primer and a reverse primer. In various embodimentsthe methods can amplify and/or assemble at least 3 or at least 5 atleast 10 or at least 50 or 3-10 or 3-20 or 3-24 or 3-30 or 3-50 or 3-60or 3-70 or 3-80 or 3-100 or 3-120 or 5-10 or 5-30 or 5-200 or 8-15 or10-30 or 10-50 or 25-70 or 25-100 or 25-120 or 25-150 or 25-200 or25-225 or 25-250 or 25-300 oligonucleotide pairs or couplets (andconsequently twice such numbers of oligonucleotides). In the methods theoligonucleotides form couplets as described herein.

After removal of the 3′ and/or 5′ primer binding sequences the resultingoligo pairs (or couplets) have overlapping, complementary regions withthe adjacent pair(s) of oligonucleotides or couplet(s) and the set canbe assembled into the nucleic acid molecule of desired sequence. At thisstep the end oligo pairs or couplets can still have the universal 3′and/or 5′ flanking sequences. The nucleic acid molecule of desiredsequence can also, optionally, be amplified one or more times byutilizing primers that bind to the universal 3′ and/or 5′ flankingsequences, 120, 125.

Any suitable method can be used to assemble the nucleic acid molecule ofdesired sequence, but in some embodiments polymerase cycling assembly(PCA or “assembly PCR”) is used for assembly. In PCA a DNA molecule isassembled from shorter oligonucleotides in a precise order based on thesingle-stranded oligonucleotides used in the process. Any number ofcycles of PCA or other DNA assembly procedure can be conducted toassemble the nucleic acid molecule, for example at least 5 cycles or atleast 10 cycles or at least 15 cycles or at least 20 cycles or at least25 cycles or at least 30 cycles or at least 35 cycles or at least 50cycles. Other assembly methods, or cloning or DNA joining methods canalso be used. GIBSON ASSEMBLY® is another such method that can be usedto assemble the resulting DNA fragments, and embodiments are describedin U.S. Pat. No. 8,968,999, which is incorporated by reference herein inits entirety, including all tables, figures, and claims. The nucleicacid molecule of desired sequence is thus assembled. This nucleic acidmolecule will still contain the 3′ and 5′ universal flanking sequencesthat were present on the 3′ and 5′ end oligonucleotides, which can alsobe removed if desired. An optional, second and subsequent amplificationstep(s) can be performed to amplify the nucleic acid of desired sequenceafter this step using primers that bind to the universal flankingsequences.

The length of the oligonucleotides that form the couplets will depend onthe number of couplets being assembled and the length of the nucleicacid molecule of desired sequence being assembled. In variousembodiments the oligonucleotides forming the couplets can be 10-15 or10-20 or 10-30 or 10-40 or 15-40 or 15-60 or 15-100 or 15-150 or 15-200or 15-250 or 20-40 or 20-60 or 20-80 or 20-100 or 20-120 or 20-150 or20-180 or 20-200 or 20-250 nucleotides in length. In other embodimentsthe oligonucleotides that form the couplets can be 30-40 or 30-60 or30-80 or 30-100 or 30-120 or 30-150 or 30-180 or 30-200 or 30-250nucleotides in length. In some embodiments the oligonucleotide pairsform a couplet through an overlapping region of at least 10 or at least12 or at least 15 or at least 20 or at least 25 or at least 30 basepairs, which can be bound by standard Watson-Crick base pairing. Invarious embodiments the overlap can be expressed as a percentage of thesequence of either of the oligonucleotides, and in various embodimentsat least 20% or at least 25% or at least 30% or at least 32% or at least36% or at least 40% or at least 45% or at least 50% or 10-50% or 10-60%or 10-65% or 20-50% or 20-60% or 20-65% or 25-40% or 25-50% or 25-60% or25-65% or 30-40% or 30-50% or 30-60% or 30-65% or less than 75% or lessthan 65% or less than 55% or less than 50% of the nucleotides in eitherof the oligonucleotides can be in the overlapping sequence to form acouplet. In various embodiments the percentages can relate to either theshorter or the longer of the two oligonucleotides. In other embodimentsthe oligonucleotides of the pair of are of equal length, or within 10%or 20% or 30% or 40% or 50% length of each other, and the samepercentage overlap values can be used.

The nucleic acid molecule of desired nucleotide sequence to be assembledin the methods can be of any length. In some examples the nucleic acidmolecule can be from about 40-100 bp or 50-100 bp or 50-150 bp or 80-120bp or 100-1000 bp or 100-800 bp or 100-700 bp or 50-600 bp or 100-600 bpor 50-100 bp, or 50-1000 bp or 50-1500 bp, or 50-2000 bp, or 50 bp-5 kbpor 50 bp-6 kbp or 50 bp-7 kbp or 50 bp-10 kbp, or from about 1-10 kbp orfrom about 2-10 kbp, or from about 4-10 kbp or from about 5-10 kbp, orfrom about 5-15 kbp. In other embodiments the nucleic acid molecule tobe assembled can have at least 100 bp and less than 1000 bp or less than5,000 bp or less than 10,000 bp or less than 15,000 bp or less than20,000 bp. In more embodiments the molecule can be greater than 1 kbp orgreater than 2 kbp or greater than 3 kbp, or 1 kbp to about 5 kbp, orfrom 1 kbp to about 7 kbp, or from 1-10 kbp or from 1 kbp-12 kbp or from1 kbp-15 kbp or from 1 kbp-16 kbp or from 1 kbp-17 kbp or from 1 kbp-20kbp or 1 kbp-50 kbp or 1 kbp-100 kbp or 1 kbp-500 kbp or from 200-700kbp or from 1 kb-1 Mbp, or up to 3 Mbp or up to 5 Mbp or from 1 kbp-5Mbp, or from 1 kbp-7 Mbp, or from 1 kbp-10 Mbp.

The Method

The methods of the invention can be practiced in various embodimentsdepending on the specific nucleic acid molecule to be assembled. In someembodiments a method is practiced by contacting a DNA polymerase anddNTPs with a plurality of oligo pairs or couplets of the invention, suchas any described herein. In the methods any number of oligonucleotidepairs or couplets described herein can be amplified and/or assembled.For example, the method can amplify and/or assemble from 2-10oligonucleotide pairs or couplets, or from 3-10 or 3-20 or 3-30 or 3-40or 3-50 or 3-60 or 3-70 or 3-80 or 3-100 or 3-150 or 3-200 or 3-225oligonucleotide pairs. The length of the oligonucleotides that form thepairs that are amplified and/or assembled can be any length of oligopairs as described herein.

The methods can involve a step of forming couplets from the variouspairs of oligonucleotides that will be assembled into the DNA moleculehaving the desired sequence. Persons or ordinary skill with reference tothis disclosure understand the temperatures the oligo pairs can beannealed into couplets, and that the specific parameters depend on thelength and composition of the oligo pairs. In some embodiments the oligopairs can be annealed at less than 55° C. or less than 50° C. or lessthan 45° C. or 40-55° C. or 38-55° C., or other temperatures dependingon the specific oligo pairs being annealed into couplets. In variousembodiments the contacting of the pairs of oligonucleotides with the DNApolymerase and dNTPs can be done in a solution or a mixture and canoccur simultaneously in a single container, or can be done sequentially.The DNA polymerase can be any suitable for the circumstances, but insome embodiments will be a thermostable DNA polymerase. Examplesinclude, but are not limited to, Taq DNA polymerase or a Pyroccocus-typeDNA polymerase. As a thermostable DNA polymerase it is active attemperatures of greater than 70° C. or greater than 90° C. or greaterthan 98° C. In particular embodiments a Pyroccoccus-like enzymecontaining a processivity enhanced domain to permit increasedprocessivity is also suitable. While any DNA polymerase may be used, aDNA polymerase delivering high accuracy and high processivity will bemost effective. DNA polymerases known in the art as being high fidelity,thermophilic DNA polymerases can also be used. In some embodiments theDNA polymerase can also have 5′→3′ DNA polymerase activity and/or a3′→5′ exonuclease activity. In one embodiment the DNA polymerasegenerates blunt ends in the amplification of products in DNAamplification reactions. Additional, non-limiting examples of DNApolymerases that can be used in the invention include DNA polymerasefrom Pyrococcus furiosus, which can be modified at one or more domainsto provide greater activity and/or greater accuracy than the nativeenzyme. The modification can include a change in the nucleic acidsequence of the enzyme to provide for an enzyme with more advantageousproperties in a DNA assembly procedure. The DNA polymerase can have allor only some of these properties, and the person of ordinary skill withresort to the present specification will realize which properties can beadvantageously employed in a particular application of the methods andwhich reaction conditions and buffer components are appropriate for aparticular DNA polymerase. Examples of DNA polymerases suitable for thepresent methods include the commercially available PHUSION® HighFidelity DNA polymerase (Finnzymes, Oy, FI) or VENT® DNA polymerase,which has a 3′ to 5′ proofreading exonuclease activity. In oneembodiment a master mix can contain the DNA polymerase with MgCl₂ atsuitable concentration (e.g., 1-2 mM or 1.5 mM), as well as a mixture ofdNTPs at a suitable concentration (e.g., 200 uM of each dNTP at finalreaction concentration) in 100% DMSO. But other DNA polymerases aresuitable and may also be used and VENT® DNA polymerase is anotherexample.

When uridine is utilized in the 3′ and/or 5′ primer binding sequences oruniversal flanking sequences the DNA polymerase can be a uracil-literateDNA polymerase. Uracil-literal DNA polymerases can recognize uracil inDNA templates and do not cease activity upon encounteringuracil-containing nucleotides. DNA polymerases from Pyrococcus furiosusor from Methanosarcina acetivorans, or those from the Family B DNApolymerases are uracil-literate, and the person of ordinary skill willrealize other DNA polymerases that are uracil-literate and that can beutilized in the invention. The DNA polymerase can also be a DNApolymerase from a single-celled organism from the taxonomic domain andkingdom of Archaea. The enzymes can be thermostable and can be designedto be faster and more accurate and/or can extend DNA synthesis furtherthan conventional DNA polymerases. In one embodiment the DNA polymerasecan read through uracil, can extend a kilobase of sequence in less than15 seconds or less than 17 seconds, and can have an accuracy at least20× or at least 22× or at least 24× higher than Taq DNA polymerase.VERASEQ® Ultra DNA polymerase is an example of a uracil-literal DNApolymerase that functions in the invention.

The PCR and PCA assembly procedures used in the method can be thoseknown to persons of ordinary skill in the art and utilized according tothe present disclosure. By way of example, multiple cycles of PCR or PCA(or other assembly and/or amplification reactions) can be performed forthe amplification and/or assembly reactions, for example, at least 10cycles or at least 15 cycles or at least 20 cycles or at least 25 cyclesor at least 30 cycles. Each cycle can comprise an annealing phase, anextension phase, and a denaturation phase, which are defined by thephysical activities performed by the DNA during each phase. In certainembodiments the annealing phase and extension phase can be combined tooccur during a combined annealing and extension phase. In someembodiments the annealing phase performed at between 45° C. and 77° C.,the extension phase performed at between 50° C. and 77° C., and adenaturation phase performed at greater than 70° C. or greater than 90°C. When the annealing phase and extension phase are combined into asingle phase the combined phase can occur between 45° C. and 77° C., or57-77° C., or at about 67° C. Persons of ordinary skill with resort tothis disclosure will realize that the actual temperatures used duringeach of the phases is influenced by the size and content of the DNAbeing assembled. In some embodiments polyethylene glycol or anothercrowding agent can also be included in the mixture at an appropriateconcentration, e.g., at least 0.0188% or at least 0.025% or at least0.375%. Another crowding agent can also be used instead of or incombination with PEG, e.g., Ficoll 70, or high-mass, branchedpolysaccharides (e.g., dextran). When PCA is used for assembly it canutilize the same cycles and temperature ranges.

The method allows multiple nucleic acid constructs to be assembled froma single sample pool. Considering an embodiment where, for example, fiveoligonucleotide pairs (e.g., derived from a solid phase synthesis) areto be amplified and assembled into a DNA molecule of desired sequence,the method allows all five oligonucleotide pairs to be assembled from asingle pool of oligonucleotides. Each of the five oligonucleotide pairscan be synthesized to contain a different set of 3′ and/or 5′ primerbinding sequences. Thus, the oligo pairs (or “couplets”) would haveprimer binding sequences as follows: couplet 1) A-B; couplet 2) C-D;couplet 3) E-F; couplet 4) G-H, couplet 5) I-J. After synthesis theoligo pairs can be combined and then divided into five pools. If oneplaces the primers for set A-B in the first pool, and the primers forset C-D in the second pool, etc., then each couplet or oligo pair, andonly that couplet or pair, will be amplified in the respective pool. Inother embodiments all of the even and odd oligo pairs in a pool can havea different set of 3′ and/or 5′ primer binding sequences, with “even”and “odd” referring to the number assigned the oligo pair in a schemefor assembling the DNA molecule of desired sequence, for example asdepicted in FIG. 3. But in other embodiments three or four or 3-8 or3-10 or 5-10 or 5-15 or 5-20 or 5-25 or 5-30 or 5-50 or 3-200 or 3-225or 3-250 or 3-300 couplets or oligo pairs, or any number of couplets asdescribed herein, can be amplified and assembled, and in someembodiments from the same pool of synthesized (or parsed)oligonucleotides.

The nucleic acid molecules of desired sequence assembled by the methodscan be any nucleic acid molecule or nucleic acid construct. Examplesinclude, but are not limited to, plasmids, genes or gene families,regulatory sequences, artificial chromosomes, vector sequences, afunctional gene, a CRISPR/Cas9 gRNA template, or a genome of a virus,bacteria, or algae. The construct can also be a DNA-based informationstorage molecule, where information is stored in the form of the numberand order of nitrogenous bases with each set of bases indicating ameaningful character or value that can be deciphered into anunderstandable language. In some embodiments the nucleic acid moleculeproduced by the methods does not contain any regulatory, non-coding, orextraneous sequences that are not a part of the natural gene, i.e., thenucleic acid molecule is produced without nucleotide artifacts from theassembly and preparation procedure. Any of the nucleic acids assembledby the methods can also have one or more non-standard bases, such as3-nitropyrrole, 5-nitroindole, deoxyuridine, and others, that have acorresponding enzyme that will cleave the site or mark it for cleavageby another enzyme.

In some embodiments the methods can be performed in one step, meaningthat the oligonucleotide pairs are formed and assembled in a singlecontainer without having to open the container once the components forthe assembly are added. In another embodiment the method is an automatedmethod, meaning that no human actions are required after the componentsof the assembly method are placed into the container and the assemblyreaction is initiated—instead the reaction goes to completion byautomated methods and without further human action.

Error Suppression

Another benefit provided by the present invention is the ability tosuppress sequencing errors. Because the method involves the annealing ofoligonucleotide pairs into couplets, a natural selection against errorsor sequence mismatches occurs because error-free oligos are more likelyto anneal to each other and form a couplet than oligonucleotides thatcontain one or more errors or sequence mismatches. Error rates in theoverlap sequence are at least 35% or at least 39% or at least 40% lowerthan in non-overlap areas of a couplet. And error rates in overlap areasare 65% or less or 61% or less or 60% or less versus non-overlap areasof a couplet. Thus, the methods are also methods for reducing orminimizing errors in an assembled nucleic acid molecule versus errorrates with assembly methods that do not involve a step of coupletformation. In various embodiments the nucleic acids produced by thepresent methods have an error rate of less than 1.5% or less than 1.3%or less than 1.2% or less than 1.1% or less than 1.0% or less than 0.9%or less than 0.8% or less than 0.7% or less than 0.6% or less than 0.55%or less than 0.50% or less than 0.45% or less than 0.3% or less than0.2% or less than 0.1%. Therefore, and as illustrated in FIG. 6, themethods are able to suppress errors made during at the synthesis stageof the oligonucleotides. Therefore, the invention also provides methodsof synthesizing nucleic acid molecules of a desired sequence disclosedherein having an error rate disclosed herein.

Compositions

The invention also provides compositions useful for conducting themethods of the invention. In one aspect the invention provides acomposition comprising a plurality of oligonucleotides formed intocouplets. Each couplet can comprise a portion of a desired nucleic acidsequence, and the couplets comprise an internal sequence that comprisesa portion of the desired nucleic acid sequence. When the plurality ofcouplets is arranged in schematic or illustrative order, adjacent toeach other and according to their internal sequences they comprise atleast a portion of a desired nucleic acid sequence. In some embodimentseach couplet also has a 3′ or a 5′ primer binding sequence, and eachcouplet can contain a sequence that overlaps and is complementary to aportion of a sequence from at least one adjacent couplet. The desirednucleic acid sequence has a 3′ end and a 5′ end, and in some embodimentsthe couplets that make up the 3′ and 5′ ends of the desired sequence canalso have a universal 3′ flanking sequence and a universal 5′ flankingsequence, respectively. The couplets in the composition can be anycouplets described herein. In some embodiments the composition alsocontains a DNA polymerase and/or dNTPs. In some embodiments thecomposition is contained in a single container. The composition canfurther include an effective amount of a preservative. Any suitablepreservative can be used such as, for example, glycerol. When glycerolis chosen as the preservative it can be present at a concentration of atleast 20% or at least 30% or at least 40% or at least 50% w/w. Thecomposition can also be present in a suitable buffer or in water. Insome embodiments the couplets comprise at least 50% of theoligonucleotides in the mixture, and one or both of the oligonucleotidesthat form the couplets overlap at least 33% of their sequence to formthe couplet.

Example 1 Assembly of 500 bp Constructs

This example illustrates the method in the partitioning and assembly of24 DNA constructs of about 500 bp each from an oligo pool. The startingoligonucleotides were synthesized on a microarray and harvested into theoligo pool. About 10 of the oligos, when arranged adjacent to each otherand in order comprise one of the 500 bp DNA constructs to be assembled.The oligos were each about 78 bp in length having a 30 bp overlap withthe adjacent oligonucleotide and contained 18 bp of poly-A 3′ and/or 5′flanking primer binding sites. The harvested oligo pool was diluted to a0.25-0.5 nM average for each oligo. 24 pairs of primers containing dUand a thermostable DNA polymerase that was uracil-literate was added tothe mixture.

The mixture was subjected to a PCR procedure according to the protocolshown below with partitioning primers containing dUs and using VERASEQ®DNA polymers (VSU). The product was then diluted 2× and the flankingsequences removed by scarless flank removal, with a mixture of UDG,endonuclease VIII, and exonuclease T, with an incubation for 15 min at37° C. and 1 h at 25° C. This was then followed by 12 cycles of PCA,according to the protocol shown below. A second amplification reaction(PCR2/PCA2) was then performed for 30 cycles, as shown below using2×PHUSION® master mix.

The result is shown in FIGS. 2a and 2b showing the result of theassembly reaction for 24 constructs at two different concentrations. 23out of 24 of the 500 bp constructs were successfully assembled asevidenced by the bands in the expected location for 500 bp constructs.

PCR1 Program at 43-50° C.

-   -   1. 98° C. 1 min    -   2. 98° C. 30 sec    -   3. 43° C. 30 sec    -   4. 45° C. 30 sec, add 15 sec/cycle    -   5. Go to 2, 5×    -   6. 98° C. 30 sec    -   7. 45° C. 30 sec, add 15 sec/cycle    -   8. 47° C. 30 sec    -   9. Go to 6, 5×    -   10. 98° C. 30 sec    -   11. 50° C. 1 min, add 15 sec/cycle    -   12. Go to 10, 8×    -   13. 72° C. 5 min    -   14. 10° C. 0 (forever)

PCA Program (12 Cycles)

-   -   1. 98° C. 1 min    -   2. 98° C. 30 sec    -   3. 43° C. 30 sec    -   4. 45° C. 30 sec, add 15 sec/cycle    -   5. Go to 2, 5×    -   6. 98° C. 30 sec    -   7. 45° C. 30 sec, add 15 sec/cycle    -   8. 47° C. 30 sec    -   9. 52° C. 30 sec    -   10. Go to 6, 5×    -   11. 72° C. 5 min    -   12. 10° C. 0 (forever)

PCR2 Program 45-55° C. (30 Cycles)

-   -   1. 98° C. 1 min    -   2. 98° C. 30 sec    -   3. 50° C. 30 sec    -   4. 45° C. 30 sec, add 15 sec/cycle    -   5. Go to 2, 9×    -   6. 98° C. 30 sec    -   7. 48° C. 30 sec    -   8. 52° C. 30 sec, add 15 sec/cycle    -   9. Go to 6, 9×    -   10. 98° C. 30 sec    -   11. 55° C. 1 min and 30 sec, add 15 sec/cycle    -   12. Go to 10, 9×    -   13. 72° C. 5 min    -   14. 10° C. 0 (forever)

Example 2 Assembly of A 1.2 kb Construct

This example shows the assembly of a 1.2 kb DNA construct from oligopools with 16, 32, and 58 oligonucleotides according to the invention.The Example also shows the robustness of the method of the invention.

Fifty-eight oligonucleotides were designed to comprise the DNA moleculeof desired sequence to be assembled. The oligonucleotides were eachapproximately 78 bp in length and had about a 30 bp overlap with theadjacent oligo, plus an 18 bp flanking sequence on the 3′ or 5′ end.

The oligos were divided into three pools as follows:

-   -   a. Pool 1: The oligos were placed into four sub-pools with 16,        16, 16, and 10 oligos resulting in a concentration of 5 nM for        each oligo.    -   b. Pool 2: The oligos were placed into two sub-pools with 32 and        26 oligos resulting in a concentration of 2.5 nM for each oligo.    -   c. Pool 3: All 58 oligos were provided as one pool at 1.25 nM        concentration for each oligo.

A PCR procedure was set up using the protocol in Example 1 and usingprimers containing dUs and using VERASEQ® Ultra DNA polymerase (VSU).The couplets for each of the pools were subjected to scarless flankremoval, as described above and the 3′ or 5′ flanks removed using amixture of UDG, Endo VIII, and exonuclease T per 10 ul of reaction.Twelve cycles of PCA was performed, again as described in Example 1. PCRwas performed on the assembled construct using 2×PHUSION® master mix andthe same PCR protocol in Example 1. PCR was performed again with 2 ul ofthe PCA reaction as described above.

Pool # of # of oligos Oligo cPCR SFR PCA Pool sub- in a sub- conc. # ofas Treat- assem- 1.2kb # pools pool (nM) cPCR one ment bly formed 1 416, 16, 16, 5 4 yes yes yes yes 10 2 2 32, 26 2.5 2 yes yes yes yes 3 158 1.25 1 n/a yes yes yes

As illustrated in Table 1 above, the method assembled the 1.2 kbconstruct from oligo pools in all three groups, demonstrating therobustness of the method. FIG. 7 provides a gel demonstrating theassembly of the 1.2 kb product.

Although the disclosure has been described with reference to the aboveexamples, it will be understood that modifications and variations areencompassed within the spirit and scope of the disclosure. Accordingly,the disclosure is limited only by the following claims.

What is claimed is:
 1. A method of removing primer binding sequencesfrom a dsDNA molecule comprising: contacting a dsDNA molecule with anenzyme mixture comprising an enzyme that specifically cleaves primerbinding sequences at a non-standard base; wherein the dsDNA moleculecomprises a 3′ and a 5′ primer binding sequence having at least onenon-standard base; wherein the dsDNA molecule further comprises auniversal 3′ flanking sequence and a universal 5′ flanking sequencecomprised to the inside of the 3′ and 5′ primer binding sequences,respectively; to thereby remove the primer binding sequences from thedsDNA molecule.
 2. The method of claim 1 wherein the non-standard baseis deoxyuridine.
 3. The method of claim 1 wherein the enzyme thatspecifically cleaves primer binding sequences at a non-standard base isuracil DNA-glycosylase (UDG).
 4. The method of claim 3 wherein theenzyme mixture further comprises endonuclease VIII and exonuclease T. 5.The method of claim 1 wherein the primer binding sequences are 6-30nucleotides in length.
 6. The method of claim 5 wherein the 3′ and 5′primer binding sequences are present on the 3′ and 5′ ends,respectively, of the dsDNA molecule.
 7. The method of claim 1 whereinthe dsDNA molecule does not comprise an expressed sequence tag.
 8. Themethod of claim 1 wherein the non-standard base is selected from thegroup consisting of: 3-nitropyrrole and 5-nitroindole.
 9. A compositioncomprising: a DNA polymerase, dNTPs, and a plurality of oligonucleotidesformed into couplets, wherein each couplet comprises an internalsequence that comprises a portion of the desired nucleic acid sequence;wherein when the plurality of couplets is arranged in order, adjacent toeach other and according to their internal sequences they comprise atleast a portion of a desired nucleic acid sequence, and each coupletfurther comprises a 3′ or a 5′ primer binding sequence, and each coupletcontains a sequence that overlaps and is complementary to a portion of asequence from an adjacent couplet; and wherein the desired nucleic acidsequence has a 3′ end and a 5′ end, and the couplets that comprise the3′ and 5′ ends of the desired sequence further comprise a universal 3′flanking sequence and a universal 5′ flanking sequence, respectively.10. The composition of claim 9 comprised in a single container andfurther comprising an effective amount of a preservative.
 11. Thecomposition of claim 10 wherein the couplets comprise at least 50% ofthe oligonucleotides in the mixture, and the couplets overlap at least33% of their sequences.
 12. The composition of claim 9 wherein theuniversal 3′ and 5′ flanking sequences are comprised to the inside ofthe 3′ and 5′ primer binding sequences, respectively.
 13. Thecomposition of claim 9 wherein the primer binding sequences comprise atleast one non-standard base.
 14. The composition of claim 13 wherein thenon-standard base is deoxyuradine.
 15. The composition of claim 9wherein the 3′ and 5′ primer binding sequences on an oligonucleotidecouplet are not complementary to each other.
 16. The composition ofclaim 9 wherein the oligonucleotides comprise from 60 to 100 nucleotidesand the primer binding sequences comprise from 8 to 30 nucleotides. 17.The composition of claim 9 wherein the 3′ and 5′ primer bindingsequences do not comprise a restriction site for a restriction enzyme.18. The composition of claim 9 further comprising uracil DNAglycosylase, endonuclease VIII, and exonuclease T.